Picture for Aditi Raghunathan

Aditi Raghunathan

Self-Trained Verification for Training- and Test-Time Self-Improvement

Add code
May 28, 2026
Viaarxiv icon

Understanding and Mitigating Premature Confidence for Better LLM Reasoning

Add code
May 23, 2026
Viaarxiv icon

Base Models Look Human To AI Detectors

Add code
May 19, 2026
Viaarxiv icon

Early Data Exposure Improves Robustness to Subsequent Fine-Tuning

Add code
May 12, 2026
Viaarxiv icon

Annotations Mitigate Post-Training Mode Collapse

Add code
May 11, 2026
Viaarxiv icon

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

Add code
May 04, 2026
Viaarxiv icon

Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories

Add code
Apr 19, 2026
Viaarxiv icon

Pando: Do Interpretability Methods Work When Models Won't Explain Themselves?

Add code
Apr 13, 2026
Viaarxiv icon

Hodoscope: Unsupervised Monitoring for AI Misbehaviors

Add code
Apr 13, 2026
Viaarxiv icon

The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data

Add code
Mar 17, 2026
Viaarxiv icon